Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OCPBUGS-1484: rebase on upstream 1.25.0 #241

Merged
merged 287 commits into from
Sep 29, 2022

Conversation

elmiko
Copy link

@elmiko elmiko commented Sep 15, 2022

This commit rebases the autoscaler on top of the Kubernetes/Autoscaler 1.25.0 release. There are several commits that we carry on top of the upstream autoscaler and the rebase process allows us to preserve those. Here is a description of the process I used to create this PR.

(inspired by the commit description for the 1.18 rebase. pr #139)

Process

First we need to identify the carry commits that we currently have, this is done against our previous rebase to catch new changes. Once identified we will drop commits which have merged upstream and only carry unique commits. (see below for the carried and dropped commits).

Identify carry commits (run from the openshift/master branch)

git log -n 20 --oneline --no-merges

this is slightly different from previous rebases in that the history has gotten slightly wonky and i needed to manually inspect the carry patches.

After identifying the carry commits, the next step is to create the new commit-tree that will be used for the rebase and then cherry pick the carry commits into the new branch. The following commands cover these steps:

$ git remote update # make sure we update our refs
$ git checkout cluster-autoscaler-1.25.0
$ git checkout -b merge-tmp # create a temporary branch for our merge commit
$ git checkout openshift/master # we want to be at the tip of the openshift master branch when we run the next command
$ echo 'merge upstream/cluster-autoscaler-1.25.0' | git commit-tree merge-tmp^{tree} -p HEAD -p merge-tmp -F - # create a new merge commit for our history
deadbeef12345678 # id of new merge commit
$ git branch merge-1.25.0 deadbeef12345678 # create a new branch for the cherry-pick work
$ git checkout merge-1.25.0
$ git cherry-pick <carry commits> # cherry pick the needed commits into the new branch

With the merge-1.19.0 branch in place, I cherry picked the carry commits which applied, resolved merge conflicts, and finally tested the resulting tree against the unit test and end-to-end suite.

Carried Commits

These commits are for features which have not yet been accepted upstream, are integral to our CI platform, or are specific to the releases we create for OpenShift.

4baab0ccd UPSTREAM: <carry>: revert capacity annotations
b031b6a17 UPSTREAM: <carry>: Updating vertical-pod-autoscaler images to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/2e5eba0108299087402c119ba0dc979a0a6c0bd1/images/vertical-pod-autoscaler.yml
ad7a10a1d UPSTREAM: <carry>: Have VPA ignore phantom containers named "POD"
e6f39989e UPSTREAM: <carry>: Handle old Machine API specific machine delete annotation
c1b8c5623 UPSTREAM: <carry>: Rename FailureMessage to ErrorMessage
8f4ca337b UPSTREAM: <carry>: configure repository for OpenShift releases

Dropped Commits

These commits were carried over the 1.24.x release history and represent work that has been accepted upstream.

96336e7c6 UPSTREAM: <drop>: Update dependencies for Cluster Autoscaler 1.24.0
c7e7fa94c UPSTREAM: <carry>: Implement scale from zero
0b9add501 UPSTREAM: <carry>: Revert "Adding config for event filtering"

of special note in this rebase is this commit

4baab0ccd UPSTREAM: <carry>: revert capacity annotations

due to the scale from zero changes being accepted upstream we can now drop our carried patch. but, the upstream implementation has differed slightly from our's (mainly around annotation names). we will need to carry this patch until we can fix all the providers to properly use the new annotations.

vishalanarase and others added 30 commits June 6, 2022 11:36
Signed-off-by: Vishal Anarse <[email protected]>
Signed-off-by: Vishal Anarse <[email protected]>
Signed-off-by: Vishal Anarse <[email protected]>
Signed-off-by: Vishal Anarse <[email protected]>
Signed-off-by: Vishal Anarse <[email protected]>
Signed-off-by: Vishal Anarse <[email protected]>
…er-firewall

[CA]: hetzner cloud firewall feature
Part of the test verifies if all taint updates happened as expected.
The taints are verified asynchronously, and the test waits for exactly
as many taint updates as defined in the test case. A couple of test
cases were missing some expected updates (clearing the taint if
drain/deletion fails). The test could randomly fail if one of the
missing updates happened to apear before one of the expected updates.
This commit adds the missing expected updates, all should be accounted
for now.

This commit also adds a sync point to wait for all expected node
deletion results before asserting them. Without it, the test would
sometimes move to the assertion before the results were actually
reported.
CA: fix flakiness in actuation.TestStartDeletion
Signed-off-by: Vishal Anarse <[email protected]>
Design-proposals have been archived and are no longer available from the community repo.
Correct invalid GCE instances pricing
Adapt links to original design-proposal in README
 Restrict the GitHub token permissions only to the required ones; this way, even if the attackers will succeed in compromising your workflow, they won’t be able to do much.

- Included permissions for the action. https://github.com/ossf/scorecard/blob/main/docs/checks.md#token-permissions

https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#permissions

https://docs.github.com/en/actions/using-jobs/assigning-permissions-to-jobs

[Keeping your GitHub Actions and workflows secure Part 1: Preventing pwn requests](https://securitylab.github.com/research/github-actions-preventing-pwn-requests/)

Signed-off-by: neilnaveen <[email protected]>
@elmiko
Copy link
Author

elmiko commented Sep 21, 2022

/test e2e-aws-operator

3 similar comments
@elmiko
Copy link
Author

elmiko commented Sep 21, 2022

/test e2e-aws-operator

@elmiko
Copy link
Author

elmiko commented Sep 21, 2022

/test e2e-aws-operator

@elmiko
Copy link
Author

elmiko commented Sep 22, 2022

/test e2e-aws-operator

@JoelSpeed
Copy link

/approve
/lgtm
/retest-required

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Sep 22, 2022
@openshift-ci
Copy link

openshift-ci bot commented Sep 22, 2022

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: JoelSpeed

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 22, 2022
@openshift-ci-robot
Copy link

/retest-required

Remaining retests: 0 against base HEAD ff8007f and 2 for PR HEAD e827295 in total

@elmiko
Copy link
Author

elmiko commented Sep 22, 2022

/retest-required

2 similar comments
@elmiko
Copy link
Author

elmiko commented Sep 22, 2022

/retest-required

@elmiko
Copy link
Author

elmiko commented Sep 23, 2022

/retest-required

@openshift-ci
Copy link

openshift-ci bot commented Sep 23, 2022

@elmiko: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-operator e827295 link false /test e2e-gcp-operator
ci/prow/e2e-azure-operator e827295 link false /test e2e-azure-operator
ci/prow/git-history e827295 link false /test git-history

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@elmiko
Copy link
Author

elmiko commented Sep 26, 2022

i am running the test suite locally, there might an issue with the balancing test. will report back with more details

@elmiko
Copy link
Author

elmiko commented Sep 26, 2022

it appears that when nodes are marked as SchedulingDisabled they don't end up counting towards the maximum node count that the autoscaler maintains. this is causing the autoscaler to create too many nodes and thus failing the test. i am continuing to investigate, but we might need to change the way the tests detect the size of the cluster.

@elmiko
Copy link
Author

elmiko commented Sep 26, 2022

i have run the tests locally and see them pass when run individually, but there is a condition where some of our tests leave nodes in this scheduling disabled state and that seems to be affecting the other tests. i am continuing to dig into this. i feel confident that the rebase works, but it appears we have some dissonance in our test structure.

@elmiko
Copy link
Author

elmiko commented Sep 27, 2022

i examined the autoscaler code at length and it appears that it will only count nodes that have the Ready condition when calculating the maximum cluster size (i am going to open a docs PR upstream). because we have some tests that are leaving nodes in the cluster that have a NoSchedule taint, or other non-ready state, it is causing the maximum calculations to be off. i have created a patch for the tests to make the node count more accurate, this should address some of the issues we are seeing here.

openshift/cluster-api-actuator-pkg#241

i am still investigating the tests to see how they can be made more accurate.

@elmiko
Copy link
Author

elmiko commented Sep 28, 2022

just adding a note here, i think we are seeing a bad interaction between our test suites. i have run the failing tests individually and they pass in isolation, but when running the entire suite there are some issues that seem to be affecting how the workload from the test package are being scheduled. i have updated the other PR to help better isolate the test environments.

@elmiko
Copy link
Author

elmiko commented Sep 28, 2022

i am confident that we are passing all the autoscaler e2e tests as i have run them all manually. talked with @jspeed a little about this, and we agreed to override the e2e-aws-operator test. i would like to see if we can get the tests upgraded and see if we can get this to pass, so we will take another day to get the tests updated and passing here, if possible.

@elmiko
Copy link
Author

elmiko commented Sep 29, 2022

i am going to override the e2e-aws-operator test as we believe it is misbehaving currently. i wanted to get the testing fix in place in hopes that we could see the tests pass here, but now /that/ PR is also having issues. @JoelSpeed and i have talked about this and @sunzhaohua2 has confirmed the behavior of the autoscaler.

/override ci/prow/e2e-aws-operator

@openshift-ci
Copy link

openshift-ci bot commented Sep 29, 2022

@elmiko: Overrode contexts on behalf of elmiko: ci/prow/e2e-aws-operator

In response to this:

i am going to override the e2e-aws-operator test as we believe it is misbehaving currently. i wanted to get the testing fix in place in hopes that we could see the tests pass here, but now /that/ PR is also having issues. @JoelSpeed and i have talked about this and @sunzhaohua2 has confirmed the behavior of the autoscaler.

/override ci/prow/e2e-aws-operator

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-merge-robot openshift-merge-robot merged commit b700d9a into openshift:master Sep 29, 2022
@elmiko elmiko deleted the merge-1.25.0 branch September 29, 2022 17:55
Fedosin added a commit to Fedosin/rebasebot that referenced this pull request Oct 5, 2022
This patch makes the process of rebasing similar to what we do manually.
For instance, in openshift/kubernetes-autoscaler#241
@sdodson
Copy link
Member

sdodson commented Oct 5, 2022

/retitle OCPBUGS-1484: rebase on upstream 1.25.0

@openshift-ci openshift-ci bot changed the title rebase on upstream 1.25.0 OCPBUGS-1484: rebase on upstream 1.25.0 Oct 5, 2022
@openshift-ci-robot
Copy link

@elmiko: Some pull requests linked via external trackers have merged:

The following pull requests linked via external trackers have not merged:

These pull request must merge or be unlinked from the Jira bug in order for it to move to the next state. Once unlinked, request a bug refresh with /jira refresh.

Jira Issue OCPBUGS-1484 has not been moved to the MODIFIED state.

In response to this:

This commit rebases the autoscaler on top of the Kubernetes/Autoscaler 1.25.0 release. There are several commits that we carry on top of the upstream autoscaler and the rebase process allows us to preserve those. Here is a description of the process I used to create this PR.

(inspired by the commit description for the 1.18 rebase. pr #139)

Process

First we need to identify the carry commits that we currently have, this is done against our previous rebase to catch new changes. Once identified we will drop commits which have merged upstream and only carry unique commits. (see below for the carried and dropped commits).

Identify carry commits (run from the openshift/master branch)

git log -n 20 --oneline --no-merges

this is slightly different from previous rebases in that the history has gotten slightly wonky and i needed to manually inspect the carry patches.

After identifying the carry commits, the next step is to create the new commit-tree that will be used for the rebase and then cherry pick the carry commits into the new branch. The following commands cover these steps:

$ git remote update # make sure we update our refs
$ git checkout cluster-autoscaler-1.25.0
$ git checkout -b merge-tmp # create a temporary branch for our merge commit
$ git checkout openshift/master # we want to be at the tip of the openshift master branch when we run the next command
$ echo 'merge upstream/cluster-autoscaler-1.25.0' | git commit-tree merge-tmp^{tree} -p HEAD -p merge-tmp -F - # create a new merge commit for our history
deadbeef12345678 # id of new merge commit
$ git branch merge-1.25.0 deadbeef12345678 # create a new branch for the cherry-pick work
$ git checkout merge-1.25.0
$ git cherry-pick <carry commits> # cherry pick the needed commits into the new branch

With the merge-1.19.0 branch in place, I cherry picked the carry commits which applied, resolved merge conflicts, and finally tested the resulting tree against the unit test and end-to-end suite.

Carried Commits

These commits are for features which have not yet been accepted upstream, are integral to our CI platform, or are specific to the releases we create for OpenShift.

4baab0ccd UPSTREAM: <carry>: revert capacity annotations
b031b6a17 UPSTREAM: <carry>: Updating vertical-pod-autoscaler images to be consistent with ART Reconciling with https://github.com/openshift/ocp-build-data/tree/2e5eba0108299087402c119ba0dc979a0a6c0bd1/images/vertical-pod-autoscaler.yml
ad7a10a1d UPSTREAM: <carry>: Have VPA ignore phantom containers named "POD"
e6f39989e UPSTREAM: <carry>: Handle old Machine API specific machine delete annotation
c1b8c5623 UPSTREAM: <carry>: Rename FailureMessage to ErrorMessage
8f4ca337b UPSTREAM: <carry>: configure repository for OpenShift releases

Dropped Commits

These commits were carried over the 1.24.x release history and represent work that has been accepted upstream.

96336e7c6 UPSTREAM: <drop>: Update dependencies for Cluster Autoscaler 1.24.0
c7e7fa94c UPSTREAM: <carry>: Implement scale from zero
0b9add501 UPSTREAM: <carry>: Revert "Adding config for event filtering"

of special note in this rebase is this commit

4baab0ccd UPSTREAM: <carry>: revert capacity annotations

due to the scale from zero changes being accepted upstream we can now drop our carried patch. but, the upstream implementation has differed slightly from our's (mainly around annotation names). we will need to carry this patch until we can fix all the providers to properly use the new annotations.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged.
Projects
None yet
Development

Successfully merging this pull request may close these issues.